Out[1]:

GSE97239 Analysis Notebook¶


Introduction¶

This notebook contains an analyis of GEO dataset GSE97239 (https://www.ncbi.nlm.nih.gov/gds/?term=GSE97239) created using the BioJupies Generator.

Table of Contents¶

The notebook is divided into the following sections:

  1. Load Dataset - Loads and previews the input dataset in the notebook environment.
  2. PCA - Linear dimensionality reduction technique to visualize similarity between samples
  3. Clustergrammer - Interactive hierarchical clustering heatmap visualization
  4. Library Size Analysis - Analysis of readcount distribution for the samples within the dataset
  5. Differential Expression Table - Differential expression analysis between two groups of samples
  6. Volcano Plot - Plot the logFC and logP values resulting from a differential expression analysis
  7. MA Plot - Plot the logFC and average expression values resulting from a differential expression analysis
  8. Enrichr Links - Links to enrichment analysis results of the differentially expressed genes via Enrichr
  9. Gene Ontology Enrichment Analysis - Identifies Gene Ontology terms which are enriched in the differentially expressed genes
  10. Pathway Enrichment Analysis - Identifies biological pathways which are enriched in the differentially expressed genes
  11. Transcription Factor Enrichment Analysis - Identifies transcription factors whose targets are enriched in the differentially expressed genes
  12. Kinase Enrichment Analysis - Identifies protein kinases whose substrates are enriched in the differentially expressed genes
  13. miRNA Enrichment Analysis - Identifies miRNAs whose targets are enriched in the differentially expressed genes

Results¶

1. Load Dataset¶

Here, the GEO dataset GSE97239 is loaded into the notebook. Expression data was quantified as gene-level counts using the ARCHS4 pipeline (Lachmann et al., 2017), available at http://amp.pharm.mssm.edu/archs4/.

GSM2560031 GSM2560032 GSM2560033 GSM2560034 GSM2560035 GSM2560036
A1BG 47 60 61 76 91 34
A1CF 4 20 24 28 10 8
A2M 370 296 203 3926 1750 803
A2ML1 18 876 54 65 69 22
A2MP1 0 21 17 23 12 7

Table 1 | RNA-seq expression data. The table displays the first 5 rows of the quantified RNA-seq expression dataset. Rows represent genes, columns represent samples, and values show the number of mapped reads.

Sample Title age(years) gender
Sample_geo_accession
GSM2560031 Cancer 1 74 female
GSM2560032 Cancer 2 70 male
GSM2560033 Cancer 3 58 male
GSM2560034 Normal 1 74 female
GSM2560035 Normal 2 70 male
GSM2560036 Normal 3 58 male

Table 2 | Sample metadata. The table displays the metadata associated with the samples in the RNA-seq dataset. Rows represent RNA-seq samples, columns represent metadata categories.


2. PCA¶

Principal Component Analysis (PCA) is a statistical technique used to identify global patterns in high-dimensional datasets. It is commonly used to explore the similarity of biological samples in RNA-seq datasets. To achieve this, gene expression values are transformed into Principal Components (PCs), a set of linearly uncorrelated features which represent the most relevant sources of variance in the data, and subsequently visualized using a scatter plot.

Export to plot.ly »
PCA Analysis | Scatter PlotColored by Sample GroupNormalHuman Bladder Cancer

Figure 1 | Principal Component Analysis results. The figure displays an interactive, three-dimensional scatter plot of the first three Principal Components (PCs) of the data. Each point represents an RNA-seq sample. Samples with similar gene expression profiles are closer in the three-dimensional space. If provided, sample groups are indicated using different colors, allowing for easier interpretation of the results.


3. Clustergrammer¶

Clustergrammer is a web-based tool for visualizing and analyzing high-dimensional data as interactive and hierarchically clustered heatmaps. It is commonly used to explore the similarity between samples in an RNA-seq dataset. In addition to identifying clusters of samples, it also allows to identify the genes which contribute to the clustering.

Figure 2 | Clustergrammer analysis. The figure contains an interactive heatmap displaying gene expression for each sample in the RNA-seq dataset. Every row of the heatmap represents a gene, every column represents a sample, and every cell displays normalized gene expression values. The heatmap additionally features color bars beside each column which represent prior knowledge of each sample, such as the tissue of origin or experimental treatment.


4. Library Size Analysis¶

In order to quantify gene expression in an RNA-seq dataset, reads generated from the sequencing step are mapped to a reference genome and subsequently aggregated into numeric gene counts. Due to experimental variations and random technical noise, samples in an RNA-seq datasets often have variable amounts of the total RNA. Library size analysis calculates and displays the total number of reads mapped for each sample in the RNA-seq dataset, facilitating the identification of outlying samples and the assessment of the overall quality of the data.

012345GSM2560031GSM2560032GSM2560033GSM2560034GSM2560035GSM2560036Export to plot.ly »
Library Size Analysis | Bar PlotMillion reads per sampleMillion Reads

Figure 3 | Library Size Analysis results. The figure contains an interactive bar chart which displays the total number of reads mapped to each RNA-seq sample in the dataset. Additional information for each sample is available by hovering over the bars. If provided, sample groups are indicated using different colors, thus allowing for easier interpretation of the results


5. Differential Expression Table¶

Gene expression signatures are alterations in the patterns of gene expression that occur as a result of cellular perturbations such as drug treatments, gene knock-downs or diseases. They can be quantified using differential gene expression (DGE) methods, which compare gene expression between two groups of samples to identify genes whose expression is significantly altered in the perturbation. The signature table is used to interactively display the results of such analyses.

logFC AveExpr P-value FDR
Gene
*FHL1 -4.76 5.09 1.065855e-07 0.001590
*PI16 -4.63 3.61 1.152103e-07 0.001590
*PGM5 -5.04 6.96 1.985475e-07 0.001590
*ATF3 -4.43 6.05 2.007578e-07 0.001590
*C7 -4.89 4.56 2.478074e-07 0.001590
*DUSP2 -3.99 5.86 2.707542e-07 0.001590
*SCARA5 -5.02 4.05 3.417239e-07 0.001720
*ITGA8 -3.79 4.76 4.975488e-07 0.001937
*SDPR -4.83 2.81 5.373326e-07 0.001937
*CFD -4.36 6.65 5.497094e-07 0.001937
*COL21A1 -4.55 4.34 7.678082e-07 0.002460
*MATN2 -3.78 4.99 9.816865e-07 0.002662
*PYCR1 3.66 4.74 9.821799e-07 0.002662
*H3F3AP4 5.90 -0.03 1.164573e-06 0.002931
*MYOCD -4.14 3.97 1.253447e-06 0.002945
*ADGRB3 -4.05 3.80 1.368651e-06 0.002958
*CCNE1 3.77 2.80 1.543644e-06 0.002958
*SPARCL1 -3.75 3.94 1.613691e-06 0.002958
*SYNPO2 -4.63 6.39 1.663455e-06 0.002958
*ANKAR -3.25 4.40 1.744565e-06 0.002958
*ZNF208 -4.12 6.82 1.762912e-06 0.002958
*COL14A1 -3.44 6.07 2.196875e-06 0.003519
*ADH1B -3.72 3.99 2.551010e-06 0.003908
*KRT20 4.35 3.97 3.061917e-06 0.004496
*SFRP1 -3.55 3.81 3.527534e-06 0.004925
*LMOD1 -4.06 5.94 3.732110e-06 0.004925
*SLIT3 -3.08 5.75 3.814102e-06 0.004925
*NR4A1 -3.71 7.97 4.060386e-06 0.004925
*CLEC3B -3.82 2.92 4.118641e-06 0.004925
*TNXB -3.38 6.71 4.341386e-06 0.004925
*MAGEA6 5.58 1.83 4.576027e-06 0.004925
*SYNM -4.58 6.57 4.676161e-06 0.004925
*MYH11 -5.04 9.42 4.683525e-06 0.004925
*RP5-877J2.1 -5.07 0.43 4.875934e-06 0.004925
*MAGEA3 4.17 3.30 4.892195e-06 0.004925
*CTD-2547E10.6 -4.89 3.47 5.102475e-06 0.004994
*SPATA6 -3.13 4.81 5.538563e-06 0.005190
*ADAM33 -2.90 4.97 5.737533e-06 0.005190
*FGL2 -4.17 5.68 5.743586e-06 0.005190
*IL33 -3.91 1.78 6.262836e-06 0.005517
*AOX1 -3.30 2.72 6.823244e-06 0.005864
*RP11-497H16.5 -5.39 0.22 8.085313e-06 0.006761
*PTGS2 -3.61 3.92 8.330853e-06 0.006761
*KIF2C 2.96 3.61 8.726937e-06 0.006761
*ZNF593 3.46 1.96 8.765132e-06 0.006761
*PRKG1 -3.26 3.40 8.856972e-06 0.006761
*SPTBN2 2.91 6.12 9.094836e-06 0.006761
*PREX2 -3.32 4.06 9.462113e-06 0.006761
*SMOC2 -3.36 4.39 9.601907e-06 0.006761
*FASN 3.22 7.99 9.613455e-06 0.006761
*DPY19L2 -3.08 3.88 9.785384e-06 0.006761
*ROR1 -2.97 3.39 1.010304e-05 0.006830
*ASB2 -3.19 3.52 1.027234e-05 0.006830
*VGLL1 3.55 3.39 1.064247e-05 0.006945
*METTL24 -3.71 2.31 1.114547e-05 0.007141
*GRIN2D 3.45 4.14 1.161112e-05 0.007306
*IGFBP3 3.34 7.95 1.201528e-05 0.007318
*CKB -4.67 4.60 1.212224e-05 0.007318
*PID1 -3.53 2.30 1.225330e-05 0.007318
*CD69 -4.41 1.60 1.257140e-05 0.007383
*SLC8A1 -3.70 6.48 1.407632e-05 0.008000
*SEPHS2 3.00 4.47 1.423078e-05 0.008000
*PRUNE2 -3.50 4.93 1.430226e-05 0.008000
*KPNA2 2.93 3.80 1.565105e-05 0.008617
*ACTG2 -4.35 6.31 1.618841e-05 0.008776
*ARHGAP6 -2.70 3.89 1.674956e-05 0.008943
*MICU3 -3.58 2.73 1.830912e-05 0.009630
*CRB3 2.92 4.06 1.906380e-05 0.009758
*SORBS1 -3.41 6.17 1.910633e-05 0.009758
*PDE7B -3.49 3.92 1.948349e-05 0.009808
*CITED4 3.20 4.22 2.168594e-05 0.010700
*RCAN2 -2.72 3.32 2.186310e-05 0.010700
*RGS2 -3.19 3.60 2.225020e-05 0.010740
*OGN -4.01 0.89 2.335638e-05 0.010895
*FOS -4.15 8.89 2.361547e-05 0.010895
*RGS1 -3.47 4.20 2.375134e-05 0.010895
*ADCY5 -2.75 4.93 2.380768e-05 0.010895
*ZWINT 2.81 3.30 2.503302e-05 0.011081
*RIMS1 -3.24 3.80 2.531290e-05 0.011081
*COL4A6 -3.28 5.39 2.574028e-05 0.011081
*AKT3 -3.08 5.18 2.581187e-05 0.011081
*ADAMTS19 -2.87 3.15 2.620756e-05 0.011081
*RP11-206L10.8 -2.88 5.13 2.632450e-05 0.011081
*CNN1 -4.35 6.53 2.641362e-05 0.011081
*AL589743.1 2.91 5.46 2.687071e-05 0.011140
*IL1RAPL1 -3.22 1.73 2.775520e-05 0.011373
*PAFAH1B3 3.93 3.18 2.818201e-05 0.011400
*DCUN1D4 -2.81 5.00 2.846867e-05 0.011400
*CEP85L -2.92 4.51 2.965915e-05 0.011674
*PARM1 -2.72 3.81 2.981521e-05 0.011674
*KIAA1586 -3.22 2.33 3.221820e-05 0.012476
*AMY2B -2.99 5.20 3.472861e-05 0.013220
*SOBP -3.26 4.81 3.525751e-05 0.013220
*PTGS1 -3.77 4.96 3.531229e-05 0.013220
*UPK3B 3.16 5.92 3.563935e-05 0.013220
*KLF2 -2.61 4.71 3.625065e-05 0.013292
*TMEM132A 2.85 5.54 3.659009e-05 0.013292
*LMNB2 2.54 5.99 3.907613e-05 0.014051
*B3GNT3 2.41 4.42 4.113706e-05 0.014642
*TNIK -2.63 4.51 4.242493e-05 0.014950

Table 3 | Differential Expression Table. The figure displays a browsable table containing the gene expression signature generated from a differential gene expression analysis. Every row of the table represents a gene; the columns display the estimated measures of differential expression. Links to external resources containing additional information for each gene are also provided


6. Volcano Plot¶

Volcano plots are a type of scatter plot commonly used to display the results of a differential gene expression analysis. They can be used to quickly identify genes whose expression is significantly altered in a perturbation, and to assess the global similarity of gene expression in two groups of biological samples. Each point in the scatter plot represents a gene; the axes display the significance versus fold-change estimated by the differential expression analysis.

Figure 4 | Volcano Plot. The figure contains an interactive scatter plot which displays the log2-fold changes and statistical significance of each gene calculated by performing a differential gene expression analysis. Every point in the plot represents a gene. Red points indicate significantly up-regulated genes, blue points indicate down-regulated genes. Additional information for each gene is available by hovering over it.


7. MA Plot¶

Volcano plots are a type of scatter plot commonly used to display the results of a differential gene expression analysis. They can be used to quickly identify genes whose expression is significantly altered in a perturbation, and to assess the global similarity of gene expression in two groups of biological samples. Each point in the scatter plot represents a gene; the axes display the average gene expression versus fold-change estimated by the differential expression analysis.

Figure 5 | MA Plot. The figure contains an interactive scatter plot which displays the average expression and statistical significance of each gene calculated by performing differential gene expression analysis. Every point in the plot represents a gene. Red points indicate significantly up-regulated genes, blue points indicate down-regulated genes. Additional information for each gene is available by hovering over it.


Enrichment analysis is a statistical procedure used to identify biological terms which are over-represented in a given gene set. These include signaling pathways, molecular functions, diseases, and a wide variety of other biological terms obtained by integrating prior knowledge of gene function from multiple resources. Enrichr is a web-based application which allows to perform enrichment analysis using a large collection of gene-set libraries and various interactive approaches to display enrichment results.

Table 4 | Enrichr links. The table displays links to Enrichr containing the results of enrichment analyses generated by analyzing the up-regulated and down-regulated genes from a differential expression analysis. By clicking on these links, users can interactively explore and download the enrichment results from the Enrichr website


9. Gene Ontology Enrichment Analysis¶

Gene Ontology (GO) is a major bioinformatics initiative aimed at unifying the representation of gene attributes across all species. It contains a large collection of experimentally validated and predicted associations between genes and biological terms. This information can be leveraged by Enrichr to identify the biological processes, molecular functions and cellular components which are over-represented in the up-regulated and down-regulated genes identified by comparing two groups of samples.

liver morphogenesis (GO:0072576)negative regulation of cysteine-type endopeptidase activity involved in apoptotic process (GO:0043154)negative regulation of apoptotic process (GO:0043066)positive regulation of chromosome segregation (GO:0051984)oncogene-induced cell senescence (GO:0090402)DNA replication-independent nucleosome assembly (GO:0006336)nucleosome assembly (GO:0006334)G1/S transition of mitotic cell cycle (GO:0000082)DNA replication-dependent nucleosome assembly (GO:0006335)inhibition of cysteine-type endopeptidase activity involved in apoptotic process (GO:1990001)05101520*pharyngeal muscle development (GO:0043282)*muscle organ morphogenesis (GO:0048644)*muscle organ development (GO:0007517)*regulation of skeletal muscle contraction by regulation of release of sequestered calcium ion (GO:0014809)*regulation of cardiac muscle contraction by regulation of the release of sequestered calcium ion (GO:0010881)*cardiac muscle contraction (GO:0060048)*muscle contraction (GO:0006936)*smooth muscle contraction (GO:0006939)*striated muscle contraction (GO:0006941)*muscle filament sliding (GO:0030049)0102030405060Export to plot.ly »
Normal vs Human Bladder Cancer | Gene Ontology Biological ProcessUp-regulated in Human Bladder CancerDown-regulated in Human Bladder Cancer
DNA end binding (GO:0045027)random coil DNA binding (GO:0003695)negative regulation of SREBP signaling pathway by DNA binding (GO:0100060)histone-dependent DNA binding (GO:0099077)DNA secondary structure binding (GO:0000217)DNA clamp unloader activity (GO:0061860)DNA clamp activity (GO:0061777)DNA binding (GO:0003677)bent DNA binding (GO:0003681)base pairing with DNA (GO:0000497)051015cAMP receptor activity (GO:0001646)3',5'-cyclic-AMP phosphodiesterase activity (GO:0004115)calmodulin-dependent cyclic-nucleotide phosphodiesterase activity (GO:0004117)actin monomer binding (GO:0003785)actin binding (GO:0003779)GTPase activator activity (GO:0005096)calcium ion sensor activity (GO:0061891)calcium ion binding involved in regulation of cytosolic calcium ion concentration (GO:0099510)calcium ion binding (GO:0005509)actin filament binding (GO:0051015)051015Export to plot.ly »
Normal vs Human Bladder Cancer | Gene Ontology Molecular FunctionUp-regulated in Human Bladder CancerDown-regulated in Human Bladder Cancer
Snt2C complex (GO:0070211)Set3 complex (GO:0034967)Rpd3L-Expanded complex (GO:0070210)perichromatin fibrils (GO:0005726)nucleolar chromatin (GO:0030874)nuclear CENP-A containing chromatin (GO:1904834)HDA1 complex (GO:0070823)Clr6 histone deacetylase complex I'' (GO:1990483)Ino80 complex (GO:0031011)nuclear nucleosome (GO:0000788)051015actin cytoskeleton (GO:0015629)filopodium tip (GO:0032433)growth cone filopodium (GO:1990812)filopodium (GO:0030175)actin filament bundle of filopodium (GO:0098861)actin filament (GO:0005884)*actomyosin, actin portion (GO:0042643)*actin filament of cell cortex of cell tip (GO:1903145)*intermediate filament cytoskeleton (GO:0045111)*actin filament branch point (GO:0061834)051015Export to plot.ly »
Normal vs Human Bladder Cancer | Gene Ontology Cellular ComponentUp-regulated in Human Bladder CancerDown-regulated in Human Bladder Cancer

Figure 6 | Gene Ontology Enrichment Analysis Results. The figure contains interactive bar charts displaying the results of the Gene Ontology enrichment analysis generated using Enrichr. The x axis indicates the enrichment score for each term. Significant terms are highlighted in bold. Additional information about enrichment results is available by hovering over each bar


10. Pathway Enrichment Analysis¶

Biological pathways are sequences of interactions between biochemical compounds which play a key role in determining cellular behavior. Databases such as KEGG, Reactome and WikiPathways contain a large number of associations between such pathways and genes. This information can be leveraged by Enrichr to identify the biological pathways which are over-represented in the up-regulated and down-regulated genes identified by comparing two groups of samples.

Nicotine addiction_Homo sapiens_hsa05033Progesterone-mediated oocyte maturation_Homo sapiens_hsa04914Oocyte meiosis_Homo sapiens_hsa04114Tyrosine metabolism_Homo sapiens_hsa00350Pentose and glucuronate interconversions_Homo sapiens_hsa00040Transcriptional misregulation in cancer_Homo sapiens_hsa05202Viral carcinogenesis_Homo sapiens_hsa05203*Cell cycle_Homo sapiens_hsa04110*Alcoholism_Homo sapiens_hsa05034*Systemic lupus erythematosus_Homo sapiens_hsa05322010203040Serotonergic synapse_Homo sapiens_hsa04726*Adrenergic signaling in cardiomyocytes_Homo sapiens_hsa04261*cGMP-PKG signaling pathway_Homo sapiens_hsa04022*AGE-RAGE signaling pathway in diabetic complications_Homo sapiens_hsa04933*Regulation of lipolysis in adipocytes_Homo sapiens_hsa04923*African trypanosomiasis_Homo sapiens_hsa05143*Vascular smooth muscle contraction_Homo sapiens_hsa04270*Malaria_Homo sapiens_hsa05144*Hypertrophic cardiomyopathy (HCM)_Homo sapiens_hsa05410*Dilated cardiomyopathy_Homo sapiens_hsa0541405101520Export to plot.ly »
Normal vs Human Bladder Cancer | KEGG PathwaysUp-regulated in Human Bladder CancerDown-regulated in Human Bladder Cancer
PluriNetWork_Mus musculus_WP1763Retinoblastoma (RB) in Cancer_Homo sapiens_WP2446DNA Damage Response_Homo sapiens_WP707Fatty Acid Biosynthesis_Mus musculus_WP336miRNA regulation of DNA Damage Response_Mus musculus_WP2087Fatty Acid Biosynthesis_Homo sapiens_WP357G1 to S cell cycle control_Homo sapiens_WP45*G1 to S cell cycle control_Mus musculus_WP413*Cell Cycle_Homo sapiens_WP179*Histone Modifications_Homo sapiens_WP23690510152025Spinal Cord Injury_Mus musculus_WP2432MAPK Signaling Pathway_Homo sapiens_WP382MAPK signaling pathway_Mus musculus_WP493*Prostaglandin Synthesis and Regulation_Mus musculus_WP374*Prostaglandin Synthesis and Regulation_Homo sapiens_WP98*Spinal Cord Injury_Homo sapiens_WP2431*Myometrial Relaxation and Contraction Pathways_Mus musculus_WP385*Myometrial Relaxation and Contraction Pathways_Homo sapiens_WP289*Striated Muscle Contraction_Mus musculus_WP216*Striated Muscle Contraction_Homo sapiens_WP383051015202530Export to plot.ly »
Normal vs Human Bladder Cancer | WikiPathwaysUp-regulated in Human Bladder CancerDown-regulated in Human Bladder Cancer
*PRC2 methylates histones and DNA_Homo sapiens_R-HSA-212300*Nucleosome assembly_Homo sapiens_R-HSA-774815*HDACs deacetylate histones_Homo sapiens_R-HSA-3214815*Deposition of new CENPA-containing nucleosomes at the centromere_Homo sapiens_R-HSA-606279*Activated PKN1 stimulates transcription of AR (androgen receptor) regulated genes KLK2 and KLK3_Homo sapiens_R-HSA-5625886*SIRT1 negatively regulates rRNA Expression_Homo sapiens_R-HSA-427359*Condensation of Prophase Chromosomes_Homo sapiens_R-HSA-2299718*RNA Polymerase I Promoter Opening_Homo sapiens_R-HSA-73728*DNA Damage/Telomere Stress Induced Senescence_Homo sapiens_R-HSA-2559586*DNA methylation_Homo sapiens_R-HSA-5334118010203040EPHA-mediated growth cone collapse_Homo sapiens_R-HSA-3928663*Erythrocytes take up oxygen and release carbon dioxide_Homo sapiens_R-HSA-1247673ECM proteoglycans_Homo sapiens_R-HSA-3000178Extracellular matrix organization_Homo sapiens_R-HSA-1474244*Digestion of dietary carbohydrate_Homo sapiens_R-HSA-189085*Cell-extracellular matrix interactions_Homo sapiens_R-HSA-446353*Synthesis of Prostaglandins (PG) and Thromboxanes (TX)_Homo sapiens_R-HSA-2162123*Striated Muscle Contraction_Homo sapiens_R-HSA-390522*Muscle contraction_Homo sapiens_R-HSA-397014*Smooth Muscle Contraction_Homo sapiens_R-HSA-4453550510152025Export to plot.ly »
Normal vs Human Bladder Cancer | Reactome PathwaysUp-regulated in Human Bladder CancerDown-regulated in Human Bladder Cancer